23 research outputs found

    Further Investigation of the Survivability of Code Technical Debt Items

    Full text link
    Context: Technical Debt (TD) discusses the negative impact of sub-optimal decisions to cope with the need-for-speed in software development. Code Technical Debt Items (TDI) are atomic elements of TD that can be observed in code artefacts. Empirical results on open-source systems demonstrated how code-smells, which are just one type of TDIs, are introduced and "survive" during release cycles. However, little is known about whether the results on the survivability of code-smells hold for other types of code TDIs (i.e., bugs and vulnerabilities) and in industrial settings. Goal: Understanding the survivability of code TDIs by conducting an empirical study analysing two industrial cases and 31 open-source systems from Apache Foundation. Method: We analysed 133,670 code TDIs (35,703 from the industrial systems) detected by SonarQube (in 193,196 commits) to assess their survivability using survivability models. Results: In general, code TDIs tend to remain and linger for long periods in open-source systems, whereas they are removed faster in industrial systems. Code TDIs that survive over a certain threshold tend to remain much longer, which confirms previous results. Our results also suggest that bugs tend to be removed faster, while code smells and vulnerabilities tend to survive longer.Comment: Submitted to the Journal of Software: Evolution and Process (JSME

    Rework Effort Estimation of Self-admitted Technical Debt

    Get PDF
    Programmers sometimes leave incomplete, temporary workarounds and buggy codes that require rework. This phenomenon in software development is referred to as Self- admitted Technical Debt (SATD). The challenge therefore is for software engineering researchers and practitioners to resolve the SATD problem to improve the software quality. We performed an exploratory study using a text mining approach to extract SATD from developersā€™ source code comments and implement an effort metric to compute the rework effort that might be needed to resolve the SATD problem. The result of this study confirms the result of a prior study that found design debt to be the most predominant class of SATD. Results from this study also indicate that a significant amount of rework effort of between 13 and 32 commented LOC on average per SATD prone source file is required to resolve the SATD challenge across all the four projects considered. The text mining approach incorporated into the rework effort metric will speed up the extraction and analysis of SATD that are generated during software projects. It will also aid in managerial decisions of whether to handle SATD as part of on-going project development or defer it to the maintenance phase

    Multi-Objective Optimization for Software Testing Effort Estimation

    Get PDF
    Software Testing Effort (STE), which contributes about 25-40% of the total development effort, plays a significant role in software development. In addressing the issues faced by companies in finding relevant datasets for STE estimation modeling prior to development, cross-company modeling could be leveraged. The study aims at assessing the effectiveness of cross-company (CC) and within-company (WC) projects in STE estimation. A robust multi-objective Mixed-Integer Linear Programming (MILP) optimization framework for the selection of CC and WC projects was constructed and estimation of STE was done using Deep Neural Networks. Results from our study indicate that the application of the MILP framework yielded similar results for both WC and CC modeling. The modeling framework will serve as a foundation to assist in STE estimation prior to the development of new a software project

    Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction)

    Full text link
    We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect the results. Hence, this paper applies (a) multi-criteria tests while (b) fixing the weaker regions of the training data (using SMOTUNED, which is a self-tuning version of SMOTE). This approach leads to dramatically large increases in software defect predictions. When applied in a 5*5 cross-validation study for 3,681 JAVA classes (containing over a million lines of code) from open source systems, SMOTUNED increased AUC and recall by 60% and 20% respectively. These improvements are independent of the classifier used to predict for quality. Same kind of pattern (improvement) was observed when a comparative analysis of SMOTE and SMOTUNED was done against the most recent class imbalance technique. In conclusion, for software analytic tasks like defect prediction, (1) data pre-processing can be more important than classifier choice, (2) ranking studies are incomplete without such pre-processing, and (3) SMOTUNED is a promising candidate for pre-processing.Comment: 10 pages + 2 references. Accepted to International Conference of Software Engineering (ICSE), 201

    A Systematic Review of Online Speech Therapy Systems for Intervention in Childhood Speech Communication Disorders

    No full text
    Currently, not all children that need speech therapy have access to a therapist. With the current international shortage of speechā€“language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. Several online speech therapy (OST) systems have been designed and proposed in the literature; however, the implementation of these systems is lacking. The technical knowledge that is needed to use these programs is a challenge for SLPs. There has been limited effort to systematically identify, analyze and report the findings of prior studies. We provide the results of an extensive literature review of OST systems for childhood speech communication disorders. We systematically review OST systems that can be used in clinical settings or from home as part of a treatment program for children with speech communication disorders. Our search strategy found 4481 papers, of which 35 were identified as focusing on speech therapy programs for speech communication disorders. The features of these programs were examined, and the main findings are extracted and presented. Our analysis indicates that most systems which are designed mainly to support the SLPs adopt and use supervised machine learning approaches that are either desktop-based or mobile-phone-based applications. Our findings reveal that speech therapy systems can provide important benefits for childhood speech. A collaboration between computer programmers and SLPs can contribute to implementing useful automated programs, leading to more children having access to good speech therapy

    A Systematic Review of Online Speech Therapy Systems for Intervention in Childhood Speech Communication Disorders

    No full text
    Currently, not all children that need speech therapy have access to a therapist. With the current international shortage of speech–language pathologists (SLPs), there is a demand for online tools to support SLPs with their daily tasks. Several online speech therapy (OST) systems have been designed and proposed in the literature; however, the implementation of these systems is lacking. The technical knowledge that is needed to use these programs is a challenge for SLPs. There has been limited effort to systematically identify, analyze and report the findings of prior studies. We provide the results of an extensive literature review of OST systems for childhood speech communication disorders. We systematically review OST systems that can be used in clinical settings or from home as part of a treatment program for children with speech communication disorders. Our search strategy found 4481 papers, of which 35 were identified as focusing on speech therapy programs for speech communication disorders. The features of these programs were examined, and the main findings are extracted and presented. Our analysis indicates that most systems which are designed mainly to support the SLPs adopt and use supervised machine learning approaches that are either desktop-based or mobile-phone-based applications. Our findings reveal that speech therapy systems can provide important benefits for childhood speech. A collaboration between computer programmers and SLPs can contribute to implementing useful automated programs, leading to more children having access to good speech therapy

    Critical Infrastructures : Reliability, Resilience and Wastage

    No full text
    By 2050, according to the UN medium forecast, 68.6% of the worldā€™s population will live in cities. This growth will place a strain on critical infrastructure distribution networks, which already operate in a state that is complex and intertwined within society. In order to create a sustainable society, there needs to be a change in both societal behaviours (for example, reducing water, energy or food waste activities) and future use of smart technologies. The main challenges are that there is a limited aggregated understanding of current waste behaviours within critical infrastructure ecosystems, and a lack of technological solutions to address this. Therefore, this article reflects on theoretical and applied works concerning waste behaviours, the reliability/availability and resilience of critical infrastructures, and the use of advanced technologies for reducing waste. Articles in the Scopus digital library are considered in the investigation, with 51 papers selected by means of a systematic literature review, from which 38 strains, 86 barriers and 87 needs are identified, along with 60 methods of analysis. The focus of the work is primarily on behaviours, barriers and needs that create an excess or wastage

    Inter-release defect prediction with feature selection using temporal chunk-based learning : An empirical study

    No full text
    Inter-release defect prediction (IRDP) is a practical scenario that employs the datasets of the previous release to build a prediction model and predicts defects for the current release within the same software project. A practical software project experiences several releases where data of each release appears in the form of chunks that arrive in temporal order. The evolving data of each release introduces new concept to the model known as concept drift, which negatively impacts the performance of IRDP models. In this study, we aim to examine and assess the impact of feature selection (FS) on the performance of IRDP models and the robustness of the model to concept drift. We conduct empirical experiments using 36 releases of 10 open-source projects. The Friedman and Nemenyi Post-hoc test results indicate that there were statistical differences between the prediction results with and without FS techniques. IRDP models trained on the data of most recent releases were not always the best models. Furthermore, the prediction models trained with carefully selected features could help reduce concept drifts

    Duplex Output Software Effort Estimation Model with Self-guided Interpretation

    No full text
    Context: Software eļ¬€ort estimation (SEE) plays a key role in predicting the eļ¬€ort needed to complete software development task. However, the conclusion instability across learners has aļ¬€ected the implementation of SEE models. This instability can be attributed to the lack of an eļ¬€ort classiļ¬cation benchmark that software researchers and practitioners can use to facilitate and interpret prediction results. Objective: To ameliorate the conclusion instability challenge by introducing a classiļ¬cation and self-guided interpretation scheme for SEE. Method: We ļ¬rst used the density quantile function to discretise the eļ¬€ort recorded in 14 datasets into three classes (high, low and moderate) and built regression models for these datasets. The results of the regression models were an eļ¬€ort estimate, termed output 1, which was then classiļ¬ed into an eļ¬€ort class, termed output 2. We refertothe models generated inthis study as duplex output models as they return twooutputs. Theintroduced duplex output models trained with the leave-one-out cross validation and evaluated with MAE, BMMRE and adjusted R2, can be used to predict both the software eļ¬€ort and the class of software eļ¬€ort estimate. Robust statistical tests (Welch's t-test and Kruskal-Wallis H-test) were used to examine the statistical signiļ¬cant differences in the modelsā€™ prediction performances. Results: Weobserved the following: (1) the duplex output models not only predicted the eļ¬€ort estimates, they also oļ¬€eredaguidetointerpretingtheeļ¬€ortexpended; (2)incorporatingthegeneticsearch algorithmintothe duplex output model allowed the sampling of relevant features for improved prediction accuracy; and (3) ElasticNet, a hybrid regression, provided superior prediction accuracy over the ATLM, the state-of-the-art baseline regression. Conclusion: The results show that the duplex output model provides a self-guided benchmark for interpreting estimated software eļ¬€ort. ElasticNet can also serve as a baseline model for SEE

    An empirical study on the effectiveness of data resampling approaches for crossā€project software defect prediction

    No full text
    Crossā€project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN)Filter approach have shown promising results in recent studies. A key challenge with defectā€prediction datasets is class imbalance, that is, highly skewed datasets where nonbuggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to withinā€projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderlineā€SMOTE, Random Oversamplingand ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links and Oneā€sided selection) is investigated and results are compared to approaches without data resampling. The authors examined six defect prediction models on34 datasets extracted from the PROMISE repository. The authors' results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and gā€measure prediction performance. However, if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.open access</p
    corecore